Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The BBN Byblos Hindi OCR system

Identifieur interne : 001368 ( Main/Exploration ); précédent : 001367; suivant : 001369

The BBN Byblos Hindi OCR system

Auteurs : Prem Natarajan [États-Unis] ; Ehry Macrostie [États-Unis] ; Michael Decerbo [États-Unis]

Source :

RBID : Pascal:05-0361372

Descripteurs français

English descriptors

Abstract

The BBN Byblos OCR system implements a script-independent methodology for OCR using Hidden Markov Models (HMMs). We have successfully ported the system to Arabic, English, Chinese. Pashto, and Japanese. In this paper, we report on our recent effort in training the system to perform recognition of Hindi (Devanagari) documents. The initial experiments reported in this paper were performed using a corpus of synthetic (computer-generated) document images along with slightly degraded versions of the same that were generated by scanning printed versions of the document images and by scanning faxes of the printed versions. On a fair test set consisting of synthetic images alone we measured a character error rate of 1.0%. The character error rate on a fair test set consisting of scanned images (scans of printed versions of the synthetic images) was 1.40% while the character error rate on a fair test set of fax images (scans of printed and faxed versions of the synthetic images) was 8.7%.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">The BBN Byblos Hindi OCR system</title>
<author>
<name sortKey="Natarajan, Prem" sort="Natarajan, Prem" uniqKey="Natarajan P" first="Prem" last="Natarajan">Prem Natarajan</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>BBN Technologies 10 Moulton Street</s1>
<s2>Cambridge, MA 02138</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Macrostie, Ehry" sort="Macrostie, Ehry" uniqKey="Macrostie E" first="Ehry" last="Macrostie">Ehry Macrostie</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>BBN Technologies 10 Moulton Street</s1>
<s2>Cambridge, MA 02138</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Decerbo, Michael" sort="Decerbo, Michael" uniqKey="Decerbo M" first="Michael" last="Decerbo">Michael Decerbo</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>BBN Technologies 10 Moulton Street</s1>
<s2>Cambridge, MA 02138</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">05-0361372</idno>
<date when="2005">2005</date>
<idno type="stanalyst">PASCAL 05-0361372 INIST</idno>
<idno type="RBID">Pascal:05-0361372</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000455</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000333</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000375</idno>
<idno type="wicri:doubleKey">1017-2653:2005:Natarajan P:the:bbn:byblos</idno>
<idno type="wicri:Area/Main/Merge">001405</idno>
<idno type="wicri:Area/Main/Curation">001368</idno>
<idno type="wicri:Area/Main/Exploration">001368</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">The BBN Byblos Hindi OCR system</title>
<author>
<name sortKey="Natarajan, Prem" sort="Natarajan, Prem" uniqKey="Natarajan P" first="Prem" last="Natarajan">Prem Natarajan</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>BBN Technologies 10 Moulton Street</s1>
<s2>Cambridge, MA 02138</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Macrostie, Ehry" sort="Macrostie, Ehry" uniqKey="Macrostie E" first="Ehry" last="Macrostie">Ehry Macrostie</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>BBN Technologies 10 Moulton Street</s1>
<s2>Cambridge, MA 02138</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Decerbo, Michael" sort="Decerbo, Michael" uniqKey="Decerbo M" first="Michael" last="Decerbo">Michael Decerbo</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>BBN Technologies 10 Moulton Street</s1>
<s2>Cambridge, MA 02138</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
<imprint>
<date when="2005">2005</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Arabic</term>
<term>Chinese</term>
<term>Document image processing</term>
<term>English</term>
<term>Error rate</term>
<term>Hidden Markov models</term>
<term>Imaging</term>
<term>Implementation</term>
<term>Japanese</term>
<term>Learning</term>
<term>Optical character recognition</term>
<term>Probabilistic approach</term>
<term>Testing equipment</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Reconnaissance optique caractère</term>
<term>Implémentation</term>
<term>Modèle Markov variable cachée</term>
<term>Arabe</term>
<term>Anglais</term>
<term>Chinois</term>
<term>Japonais</term>
<term>Apprentissage</term>
<term>Formation image</term>
<term>Traitement image document</term>
<term>Appareillage essai</term>
<term>Taux erreur</term>
<term>Approche probabiliste</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The BBN Byblos OCR system implements a script-independent methodology for OCR using Hidden Markov Models (HMMs). We have successfully ported the system to Arabic, English, Chinese. Pashto, and Japanese. In this paper, we report on our recent effort in training the system to perform recognition of Hindi (Devanagari) documents. The initial experiments reported in this paper were performed using a corpus of synthetic (computer-generated) document images along with slightly degraded versions of the same that were generated by scanning printed versions of the document images and by scanning faxes of the printed versions. On a fair test set consisting of synthetic images alone we measured a character error rate of 1.0%. The character error rate on a fair test set consisting of scanned images (scans of printed versions of the synthetic images) was 1.40% while the character error rate on a fair test set of fax images (scans of printed and faxed versions of the synthetic images) was 8.7%.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Massachusetts</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Massachusetts">
<name sortKey="Natarajan, Prem" sort="Natarajan, Prem" uniqKey="Natarajan P" first="Prem" last="Natarajan">Prem Natarajan</name>
</region>
<name sortKey="Decerbo, Michael" sort="Decerbo, Michael" uniqKey="Decerbo M" first="Michael" last="Decerbo">Michael Decerbo</name>
<name sortKey="Macrostie, Ehry" sort="Macrostie, Ehry" uniqKey="Macrostie E" first="Ehry" last="Macrostie">Ehry Macrostie</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001368 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001368 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:05-0361372
   |texte=   The BBN Byblos Hindi OCR system
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024